Genetic Risk Factors Associated With Preeclampsia and Hypertensive Disorders of Pregnancy

Key Points Question What are the genetic risk factors associated with preeclampsia and hypertensive disorders of pregnancy? Findings In this genome-wide association study, 13 novel preeclampsia- or hypertensive pregnancy–associated genetic loci were discovered. Seven loci are located near genes previously associated with blood pressure traits, and several harbor genes involved in the development of placenta, remodeling of uterine spiral arteries, and kidney function. Meaning The findings further strengthen the known association between cardiovascular health and preeclampsia and provide new targets for future research of preeclampsia pathophysiology, including genes involved in placental development and kidney function.

The FINNPEC study is a nationwide preeclampsia case-control cohort recruited from five university hospitals in Finland. The cohort consist of both prospectively and retrospectively recruited women, as well as spouses and neonates of the participating women in the prospective arm of the cohort. All participants are of Finnish ancestry. Blood samples for DNA extraction were collected from all the participants, and first and third trimester serum samples as well as placental samples from a subset of the women. In addition, extensive clinical information collected from hospital records of the participants' obstetric histories, pregnancy complications and outcomes, and laboratory, blood pressure and proteinuria measurements during pregnancy, as well as information on delivery and the newborn have been obtained. Women with multiple or ovum donation pregnancy, age below 18 years, or an inability to provide informed consent based on information in Finnish or Swedish were excluded. Diagnoses were established based on medical records and confirmed independently by a research nurse and a study physician. A total of 1479, 1689 and 1564 FINNPEC cases of preeclampsia, preeclampsia or other maternal hypertensive disorder and preeclampsia or fetal growth restriction, respectively, were available for the maternal meta-analyses. A full list of the FINNPEC investigators can be found in Supplement 5. In FinnGen the study phenotypes are based on ICD-10, ICD-9 and ICD-8 codes as described in eTable 1. The FinnGen project combines genotype data from Finnish nationwide biobanks that are linked with digital health records from the national hospital discharge (from 1968 onwards), cancer (1953-), death (1969-) and medication reimbursement (1995-) registers. All participants are of Finnish ancestry. The Data Freeze 6 used in this study contains the genomic and health record data for 6% of adult Finnish women. In FinnGen, 88028 women with offspring were included in the study. FinnGen contributed a total of 4285, 9427 and 6464 cases to the preeclampsia, preeclampsia or other maternal hypertensive disorder and preeclampsia or fetal growth restriction phenotypes, respectively. Steinthorsdottir et al. 2  In EstBB the study phenotypes are based on ICD-10, ICD-9 and ICD-8 codes as described in eTable 1. EstBB is a population-based biobank, which closely reflects the age, sex and geographical distribution of the Estonian population. ICD-code information is obtained from the Estonian Causes of Death, Estonian Cancer, Estonian Tuberculosis and Estonian Health Insurance Fund registries, and also from the databases of Tartu University Hospital and North Estonia Medical Centre. The case definition for the EstBB 200K Data Freeze was similar to FinnGen, but only the ICD-10 codes were used. Total case counts in EstBB for the preeclampsia, preeclampsia or other maternal hypertensive disorder and preeclampsia or fetal growth restriction phenotypes were 1464, 4084 and 2772, respectively.

Note 3: Survival analyses between preeclampsia and other FinnGen endpoints
The goal of the analysis is to study the association between an exposure endpoint and an outcome endpoint.
Associations between endpoints are calculated using the default FinnGen survival analysis workflow described below, which follows the approach described in the NB-COMO study (https://plana-ripoll.github.io/NB-COMO/). Start of follow-up begins from date 1998-01-01, which is chosen as this date there is complete coverage for all registries. End of follow-up is defined as 2021-12-31. Prevalent cases (i.e. individuals that have been diagnosed with the outcome endpoint before 1998-01-01) were removed from the study. Only incident cases are considered. If the date of diagnoses for the exposure endpoint happens before 1998-01-01, it is assumed that it happened on 1998-01-01. Only those endpoint pairs with at least 10 individuals for each cell of the 2x2 contingency table between endpoint pairs are considered. Also, at least 25 individuals having the outcome endpoint, and endpoints must not overlap so that endpoints are not descendants of one another endpoint in the tree hierarchy or have overlapping underlying ICD codes.
To improve computational speed, a case-cohort design is used. Briefly, from the original cohort, a subcohort at The analysis is performed using Cox regression with a time-varying covariate, weighted by the inverse of the sampling probability to account for the case-cohort design. Robust standard error was used. The model is defined as:

Surv(time,outcome_endpoint) ~ exposure_endpoint + birth_year + sex
Time is calculated as (date end of follow-up -date entry in the study) defined in data pre-processing (except for individuals diagnosed with the exposure endpoint where time is split from entry till diagnosis and from diagnosis till the end of follow up, see below). Exposure endpoint is treated as a time-varying covariate, so that an individual is unexposed (value of the variable is set to 0) from 1998-01-01 until the diagnoses of the exposure endpoint and exposed (value of the variable is set to 1) after that. Lagged hazard ratios are computed with the following follow-up time windows: < 1 year, between 1 and 5 years, between 5 and 15 years. If an outcome endpoint occurs outside the time-widow, the individual with the disease endpoint is included in the analysis, but the outcome endpoint is not considered (i.e. variable is set to 0). The Cox regression is implemented using the lifelines library.
Bonferroni corrected p-value threshold was 0.00001.

Note 4: Genotyping and imputation
FINNPEC: Genomic DNA was extracted from whole blood using the NucleoSpin Blood XL DNA extraction kit  Technology Centre, University of Helsinki. Genotype calls were made using GenCall or zCall for Illumina and AxiomGT1 algorithm for Affymetrix data. Genotypes with HWE p-value p<1x10 -6 , minor allele count <3 and genotyping success rate <98 % were removed. Samples with ambiguous sex, high genotype missingness > 5% and those that were outliers in population structure (> 4 SD from mean on first two dimensions) were omitted.
Samples were pre-phased with Eagle 2.3.5 using 20,000 conditioning haplotypes. Genotypes were imputed with Beagle 4.1 using SiSu v3 imputation reference panel.
Estonian Biobank: All EstBB participants were genotyped at the Core Genotyping Lab of the Institute of Genomics, University of Tartu, using Illumina GSAv1.0, GSAv2.0, and GSAv2.0_EST arrays. Samples were genotyped and PLINK format files were created using Illumina GenomeStudio v2.0.4. Individuals were excluded from the analysis if their call-rate was < 95% or if sex defined based on heterozygosity of X chromosome did not match sex in phenotype data. Before imputation, variants were filtered by call-rate < 95%, HWE p-value < 1e-4 (autosomal variants only), and minor allele frequency < 1%. All variants were changed to be from TOP strand using GSAMD-24v1-0_20011747_A1-b37.strand.RefAlt.zip files from https://www.well.ox.ac.uk/~wrayner/strand/ webpage. Prephasing was done using the Eagle v2.3 software (number of conditioning haplotypes Eagle2 uses when phasing each sample was set to: --Kpbwt=20000) and imputation was done using Beagle v.28Sep18.793 with effective population size ne=20,000. Population specific imputation reference of 2297 WGS samples was used 3 .

Note 5: Association and meta-analysis
To minimize spurious associations due to population structure, principal component (PC) analysis was first conducted in each cohort to define ten most significant PCs. The null models of the GWAS analyses were adjusted for maternal age at birth and the first 10 PCs (FINNPEC), age, genotyping batches and the first 10 PCs (FinnGen), or year of birth and the first 10 PCs (EstBB). FINNPEC and EstBB genotypes were available in reference genome build hg37. FinnGen summary statistics were lifted over from hg38 to hg37 reference genome build using UCSC liftOver 4 to match the other cohorts.
Genomic inflation factor estimates were calculated with 'LD Score regression' (LDSC) software 5 . Final summary statistics were then created by rerunning the meta-analysis by providing the LD Score regression intercept values for the METAL "GENOMIC CONTROL" option. Genome-wide significance was set to pvalue < 5 × 10 -8 . The meta-analysis was conducted by two analysts independently and summary statistics were compared for consistency. Only variants available in at least two cohorts were used for downstream analysis. Formatting and preparation of the association summary statistics data for downstream analysis was managed with the workflow management software STAPLER 6 .

Note 6: Annotation of loci
We defined annotated loci as the lead variants and the surrounding +-1 Mbp area. To further narrow down the set of plausible variants, we fine-mapped each locus discovered in the three meta-analyses. We first extracted the summary statistics of each locus, and then applied the FinnGen finemapping pipeline (available at https://github.com/FINNGEN/finemapping-pipeline, accessed during 2/2/2022) with default parameters. In brief, the pipeline calculates linkage disequilibrium within the regions of interest with LDstore2 7 using FinnGen samples and generates 99 % credible sets using SuSie 8 .
To identify putative candidate genes we first used the web-based FUMA platform 9 , which is intended for functionally annotating GWAS findings to prioritize most likely causal variants and genes via using information the GWAS catalog 10 . In addition, we assessed if the variants had high CADD score 11 . ANNOVAR 12 and MAGMA 13 are integrated as part of FUMA, and were used for performing gene-based analysis and creating functional annotation for the meta-analysis results, respectively. In the MAGMA gene-based analysis multiple genetic markers are analyzed simultaneously to determine their joint effect, which is in contrast with GWAS analysis where the association of each individual variant with the studied phenotype is tested separately. In the absence of functional evidence, the genes most proximal to the lead variant in each locus were prioritized.
Central approach for identifying putative causal genes and obtaining novel understanding of underlying biology was our careful manual curation of the biological function of the functionally implicated or proximal genes in each associated loci via conducting literature and GWAS catalogue searches and examining GenBank 14 and UniProt 15 databases.

Note 7: Genetic correlations
To further describe the correlation of the findings with other disorders -possibly not described in earlier literature -we conducted a PheWAS analysis for the lead variants using 2,861 phenotypes provided in FinnGen Release 6. In this analysis significance threshold was adjusted via Bonferroni correction (0.05/2,861) to p-value 1.75*10 -5 .
We used the LDSC software 5,16 to estimate regression-derived SNP-based heritability and to evaluate genetic correlation ( ) between preeclampsia, preeclampsia or other maternal hypertensive disorder and preeclampsia or fetal growth restriction and 894 other previously published phenotypes (eTable 6). The phenotypes were selected to closely match the (currently defunct) LDhub 17 dataset, combined with additional circulating metabolite data from Kettunen et al. 18 . Significance threshold was adjusted using Bonferroni correction (0.05/894) to p-value < 5.6*10 -5 . of FINNPEC cohort samples. To assess the effect of PRS on the risk of preeclampsia, we defined two PRS groups: top decile and bottom 90 % group as a reference. We compared the women with high PRS with the reference group using logistic regression. The regression model included age, parity, BMI, and the blood pressure of the first trimester. We also studied the change of Nagelkerke's R 2 values to assess the change in goodness of fit of different models when PRS information was incorporated to known risk factors of preeclampsia: age, parity, BMI and blood pressure of the first trimester.

Note 8: Polygenic risk scores (PRS)
FINNPEC has extensive clinical information on studied pregnancies, and was therefore selected for the target cohort for calculating PRS. Regression model comparing the top 10 % preeclampsia-PRS compared to bottom 90 % showed statistically significant association with preeclampsia and with preeclampsia with severe symptoms as shown in eTables 8-10. Similar association was shown between PRS of preeclampsia or other maternal hypertensive and both preeclampsia, preeclampsia with severe features and the combinatory phenotype of preeclampsia or other maternal hypertensive disorder. In addition, we tested whether similar associations could be seen using PRS developed for systolic blood pressure, as in earlier studies 56,57 . Our results show no statistical association, possibly due to incorporation of first trimester systolic blood pressure measurement to the model. It should be noted that FINNPEC preeclampsia patients were recruited from Finnish University hospitals and may therefore include cases, which represent more severe symptoms than the general obstetric population. Due to being a case-control cohort, FINNPEC contains a larger proportion of cases compared to the other cohorts used in this work. Therefore the odds ratios and the improvement in Nagelkerke's R 2 values we report may be inflated, Note 9.1. PGR mediates the effects of progesterone, an essential hormone in pregnancy that maintains pregnancy and promotes endometrial maturation, angiogenesis, vasodilation and placentation 34 . Transcriptomic profiling has revealed perturbed PGR signaling in the decidual endometrium of previously preeclamptic women 35 . Mutations in the other plausible candidate gene in the same locus, TRPC6, have impact on various biochemical pathways throughout the body. Variants in this gene have been shown to cause familial focal segmental glomerulosclerosis 36 , and to affect vascular smooth muscle contractility 37 , which may subsequently contribute to the risk of cardiac hypertrophy, heart failure 38 and idiopathic pulmonary arterial hypertension 39 .
Also, the gene exhibits enhanced placental expression, and the TRPC6 knock-out mice present with structural changes of the placenta and reduced litter sizes 40 .
PZP is a protease inhibitor that prevents the activity of all four classes of proteases and it stabilises misfolded proteins, which have been shown to accumulate in preeclampsia and contribute to its pathophysiology [41][42][43][44][45] . For instance, PZP has been shown to inhibit the aggregation of amyloid beta peptide 46 -an important driver in plaque formation in several disorders, including preeclampsia, age-related macular degeneration (particularly in women), and Alzheimer disease. Also importantly, the protein has been suggested to clear out pro-inflammatory cytokines and inhibit the effects of T helper cells, thus preventing further inflammation, oxidative stress and placental dysfuntion 47 . PZP is also highly expressed in late-pregnancy serum and its upregulation during pregnancy represents a major maternal adaptation that helps to maintain extracellular proteostasis during gestation 46 . Furthermore, balanced expression of Glycodelin A (GdA) and its carrier protein PZP in the decidua seems crucial for a successful ongoing pregnancy 48 .
The ACTN4 gene encodes for an actin-binding protein with multiple roles in different cell types, and is well known for causing focal segmental glomerulosclerosis [49][50][51] . The gene product is involved in the maintenance of cytoskeletal structure, modulating cell motility and regulating endothelial cells 52,53 , and may play a role in placentation 54 . ACTN4 has been suspected to act as a regulator of trophoblast proliferation and differentiation during early pregnancy, and has been shown to have reduced expression in severe preeclampsia 54 . Specifically, the lack of expression appears to prevent cytotropboblast differentiation into interstitial extravillous cytotrophoblasts, possibly compromising the remodelling of the spiral arteries. In addition, the reduced ACTN4 levels have been suggested to activate endothelial cell apoptosis 55  only 6 % of the offspring of mated ACTN4 +/mice were homozygous for the dysfunctional gene, instead of the expected 25 %. Whether the reduced litter size observed in this study was caused by dysfunctional placental development remains unclear since the placentas were unfortunately not examined. Further functional studies are needed to gain evidence of the potential causal role of ACTN4 in preeclampsia and to uncover pathophysiological mechanisms behind the association of this locus. Note 9.2. In pregnancy, HLA-G, -F, -E and -C are potentially directly involved in tolerance induction by recognition of fetal trophoblast cells by maternal immune cells, and the classical HLA genes likely also play a role in the maintenance of peripheral tolerance [21][22][23] . The association lies within the major susceptibility locus for psoriasis (PSORS1), known for accounting much of the genetic risk for psoriasis 24,25 . The two top outliers of the HLA-region in MAGMA test, 'Coiled-Coil Alpha-Helical Rod Protein' (CCHCR1) and 'Psoriasis Susceptibility 1 Candidate 2' (PSORS1C2), have indeed been previously suggested as risk genes for psoriasis [26][27][28] .
Additionally, evidence shows PSORS1C2 effects on apolipoprotein B and blood protein levels 29,30 . Psoriasis increases the risk of adverse pregnancy outcomes and specifically preeclampsia with an odds ratio of 1.5 [30][31][32][33] .

Note 10: Study of preeclampsia or indication of fetal growth restriction.
As fetal growth restriction is a common symptom of preeclampsia, we also studied the combinatory phenotype defined as preeclampsia or fetal growth restriction (as mother's diagnosis). The available sample sizes are shown in eTable S11. The analysis workflow was similar to the preeclampsia and preeclampsia or other maternal hypertensive disorder and preeclampsia phenotypes described in the main text and eAppendix Notes 1-2. Workflow of the preeclampsia or fetal growth restriction is visualized in eFigure 6.
Preliminary meta-analysis results showed evidence of slight inflation in the meta-analysis test statistics based on the calculated LD Score intercept 1.0147 (SE 0.0077). This inflation was then corrected by rerunning the metaanalysis with genomic correction. Meta-analysis uncovered two loci (eTable 12, eFigures 7-9). The novel locus at 9q22 detected in meta-analysis of the preeclampsia phenotype was also detected when analyzing preeclampsia or fetal growth restriction phenotype, although the lead variant here was rs7470773 instead of the rs7862828 in preeclampsia. Interestingly, the lead variant in the meta-analysis of preeclampsia or fetal growth restriction is in only modest LD (r 2 0.34) with the preeclampsia meta-analysis lead variant, suggesting that the mechanisms behind the association might differ between the two phenotypes. The preeclampsia or fetal growth eTable 1. International Classification of Diseases and Related Health Problems (ICD) codes for the FinnGen and Estonian biobank (only ICD-10 in the case of Estonian biobank) phenotypes of hypertensive pregnancy.